Dyer
Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based Segmentation
Li, Jinyuan, Li, Ziyan, Li, Han, Yu, Jianfei, Xia, Rui, Sun, Di, Pan, Gang
Grounded Multimodal Named Entity Recognition (GMNER) task aims to identify named entities, entity types and their corresponding visual regions. GMNER task exhibits two challenging attributes: 1) The tenuous correlation between images and text on social media contributes to a notable proportion of named entities being ungroundable. 2) There exists a distinction between coarse-grained noun phrases used in similar tasks (e.g., phrase localization) and fine-grained named entities. In this paper, we propose RiVEG, a unified framework that reformulates GMNER into a joint MNER-VE-VG task by leveraging large language models (LLMs) as connecting bridges. This reformulation brings two benefits: 1) It enables us to optimize the MNER module for optimal MNER performance and eliminates the need to pre-extract region features using object detection methods, thus naturally addressing the two major limitations of existing GMNER methods. 2) The introduction of Entity Expansion Expression module and Visual Entailment (VE) module unifies Visual Grounding (VG) and Entity Grounding (EG). This endows the proposed framework with unlimited data and model scalability. Furthermore, to address the potential ambiguity stemming from the coarse-grained bounding box output in GMNER, we further construct the new Segmented Multimodal Named Entity Recognition (SMNER) task and corresponding Twitter-SMNER dataset aimed at generating fine-grained segmentation masks, and experimentally demonstrate the feasibility and effectiveness of using box prompt-based Segment Anything Model (SAM) to empower any GMNER model with the ability to accomplish the SMNER task. Extensive experiments demonstrate that RiVEG significantly outperforms SoTA methods on four datasets across the MNER, GMNER, and SMNER tasks.
- North America > United States > Indiana > Lake County > Dyer (0.04)
- Europe > United Kingdom > England (0.04)
- Media > Film (1.00)
- Leisure & Entertainment > Sports > Basketball (1.00)
A Triumvirate of AI Driven Theoretical Discovery
Recent years have seen the dramatic rise of the usage of AI algorithms in pure mathematics and fundamental sciences such as theoretical physics. This is perhaps counter-intuitive since mathematical sciences require the rigorous definitions, derivations, and proofs, in contrast to the experimental sciences which rely on the modelling of data with error-bars. In this Perspective, we categorize the approaches to mathematical discovery as "top-down", "bottom-up" and "meta-mathematics", as inspired by historical examples. We review some of the progress over the last few years, comparing and contrasting both the advances and the short-comings in each approach. We argue that while the theorist is in no way in danger of being replaced by AI in the near future, the hybrid of human expertise and AI algorithms will become an integral part of theoretical discovery.
- South America > Brazil > Bahia > Salvador (0.04)
- North America > United States > Indiana > Lake County > Dyer (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Germany > Berlin (0.04)
- Research Report (0.40)
- Overview (0.34)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Rethinking Graph Masked Autoencoders through Alignment and Uniformity
Wang, Liang, Tao, Xiang, Liu, Qiang, Wu, Shu, Wang, Liang
Self-supervised learning on graphs can be bifurcated into contrastive and generative methods. Contrastive methods, also known as graph contrastive learning (GCL), have dominated graph self-supervised learning in the past few years, but the recent advent of graph masked autoencoder (GraphMAE) rekindles the momentum behind generative methods. Despite the empirical success of GraphMAE, there is still a dearth of theoretical understanding regarding its efficacy. Moreover, while both generative and contrastive methods have been shown to be effective, their connections and differences have yet to be thoroughly investigated. Therefore, we theoretically build a bridge between GraphMAE and GCL, and prove that the node-level reconstruction objective in GraphMAE implicitly performs context-level GCL. Based on our theoretical analysis, we further identify the limitations of the GraphMAE from the perspectives of alignment and uniformity, which have been considered as two key properties of high-quality representations in GCL. We point out that GraphMAE's alignment performance is restricted by the masking strategy, and the uniformity is not strictly guaranteed. To remedy the aforementioned limitations, we propose an Alignment-Uniformity enhanced Graph Masked AutoEncoder, named AUG-MAE. Specifically, we propose an easy-to-hard adversarial masking strategy to provide hard-to-align samples, which improves the alignment performance. Meanwhile, we introduce an explicit uniformity regularizer to ensure the uniformity of the learned representations. Experimental results on benchmark datasets demonstrate the superiority of our model over existing state-of-the-art methods.
- North America > United States > Indiana > Lake County > Dyer (0.04)
- Asia > China (0.04)
On The Potential of The Fractal Geometry and The CNNs Ability to Encode it
Zini, Julia El, Musharrafieh, Bassel, Awad, Mariette
The fractal dimension provides a statistical index of object complexity by studying how the pattern changes with the measuring scale. Although useful in several classification tasks, the fractal dimension is under-explored in deep learning applications. In this work, we investigate the features that are learned by deep models and we study whether these deep networks are able to encode features as complex and high-level as the fractal dimensions. Specifically, we conduct a correlation analysis experiment to show that deep networks are not able to extract such a feature in none of their layers. We combine our analytical study with a human evaluation to investigate the differences between deep learning networks and models that operate on the fractal feature solely. Moreover, we show the effectiveness of fractal features in applications where the object structure is crucial for the classification task. We empirically show that training a shallow network on fractal features achieves performance comparable, even superior in specific cases, to that of deep networks trained on raw data while requiring less computational resources. Fractals improved the accuracy of the classification by 30% on average while requiring up to 84% less time to train. We couple our empirical study with a complexity analysis of the computational cost of extracting the proposed fractal features, and we study its limitation.
- Asia > Middle East > Lebanon > Beirut Governorate > Beirut (0.04)
- North America > United States > New York (0.04)
- North America > United States > Indiana > Lake County > Dyer (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Health & Medicine > Therapeutic Area (0.93)
- Energy (0.93)
Sentiment analysis in Tourism: Fine-tuning BERT or sentence embeddings concatenation?
Bouabdallaoui, Ibrahim, Guerouate, Fatima, Bouhaddour, Samya, Saadi, Chaimae, Sbihi, Mohammed
Undoubtedly that the Bidirectional Encoder representations from Transformers is the most powerful technique in making Natural Language Processing tasks such as Named Entity Recognition, Question & Answers or Sentiment Analysis, however, the use of traditional techniques remains a major potential for the improvement of recent models, in particular word tokenization techniques and embeddings, but also the improvement of neural network architectures which are now the core of each architecture. recent. In this paper, we conduct a comparative study between Fine-Tuning the Bidirectional Encoder Representations from Transformers and a method of concatenating two embeddings to boost the performance of a stacked Bidirectional Long Short-Term Memory-Bidirectional Gated Recurrent Units model; these two approaches are applied in the context of sentiment analysis of shopping places in Morocco. A search for the best learning rate was made at the level of the two approaches, and a comparison of the best optimizers was made for each sentence embedding combination with regard to the second approach.
- Africa > Middle East > Morocco (0.25)
- North America > United States > Indiana > Lake County > Dyer (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Increasing The Performance of Cognitively Inspired Data-Efficient Language Models via Implicit Structure Building
Momen, Omar, Arps, David, Kallmeyer, Laura
In this paper, we describe our submission to the BabyLM Challenge 2023 shared task on data-efficient language model (LM) pretraining (Warstadt et al., 2023). We train transformer-based masked language models that incorporate unsupervised predictions about hierarchical sentence structure into the model architecture. Concretely, we use the Structformer architecture (Shen et al., 2021) and variants thereof. StructFormer models have been shown to perform well on unsupervised syntactic induction based on limited pretraining data, and to yield performance improvements over a vanilla transformer architecture (Shen et al., 2021). Evaluation of our models on 39 tasks provided by the BabyLM challenge shows promising improvements of models that integrate a hierarchical bias into the architecture at some particular tasks, even though they fail to consistently outperform the RoBERTa baseline model provided by the shared task organizers on all tasks.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Italy > Tuscany > Florence (0.04)
- North America > United States > Indiana > Lake County > Dyer (0.04)
- (6 more...)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Indiana > Lake County > Dyer (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (2 more...)
Heterogeneous Graph Contrastive Multi-view Learning
Wang, Zehong, Li, Qi, Yu, Donghua, Han, Xiaolong, Gao, Xiao-Zhi, Shen, Shigen
Inspired by the success of contrastive learning (CL) in computer vision and natural language processing, graph contrastive learning (GCL) has been developed to learn discriminative node representations on graph datasets. However, the development of GCL on Heterogeneous Information Networks (HINs) is still in the infant stage. For example, it is unclear how to augment the HINs without substantially altering the underlying semantics, and how to design the contrastive objective to fully capture the rich semantics. Moreover, early investigations demonstrate that CL suffers from sampling bias, whereas conventional debiasing techniques are empirically shown to be inadequate for GCL. How to mitigate the sampling bias for heterogeneous GCL is another important problem. To address the aforementioned challenges, we propose a novel Heterogeneous Graph Contrastive Multi-view Learning (HGCML) model. In particular, we use metapaths as the augmentation to generate multiple subgraphs as multi-views, and propose a contrastive objective to maximize the mutual information between any pairs of metapath-induced views. To alleviate the sampling bias, we further propose a positive sampling strategy to explicitly select positives for each node via jointly considering semantic and structural information preserved on each metapath view. Extensive experiments demonstrate HGCML consistently outperforms state-of-the-art baselines on five real-world benchmark datasets.
- Asia > China > Jiangsu Province > Nanjing (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Indiana > Lake County > Dyer (0.04)
- (2 more...)
Affinity and Diversity: Quantifying Mechanisms of Data Augmentation
Gontijo-Lopes, Raphael, Smullin, Sylvia J., Cubuk, Ekin D., Dyer, Ethan
Though data augmentation has become a standard component of deep neural network training, the underlying mechanism behind the effectiveness of these techniques remains poorly understood. In practice, augmentation policies are often chosen using heuristics of either distribution shift or augmentation diversity. Inspired by these, we seek to quantify how data augmentation improves model generalization. To this end, we introduce interpretable and easy-to-compute measures: Affinity and Diversity. We find that augmentation performance is predicted not by either of these alone but by jointly optimizing the two.
ML and NLP Publications in 2019 - Marek Rei
It is about time we once again take a look at the publication statistics of the past year. Nearly all conferences had more attendees and more publications than ever before. For example, NeurIPS had 6,743 submissions and 1,428 accepted papers, which eclipses all the previous iterations. Because the conference sold out so fast last year, the organizers had to implement a randomised lottery for the tickets this time. In this post you will find a number of graphs to illustrate the publication patterns from 2019.
- North America > Canada > Ontario > Toronto (0.15)
- Europe > United Kingdom (0.15)
- North America > Canada > Quebec > Montreal (0.05)
- (8 more...)